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Abstract 


Computing  the  Singular  Value  Decomposition  (S  VD)  of  3  x  3  ma¬ 
trices  is  commonplace  in  3D  computational  mechanics  and  com¬ 
puter  graphics  applications.  We  present  a  C++  implementation  of 
implicit  symmetric  QR  SVD  with  Wilkinson  shift.  The  method  is 
fast  and  robust  in  both  float  and  double  precisions.  We  also  per¬ 
form  a  benchmark  test  to  study  the  performance  compared  to  other 
popular  algorithms. 
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1  Problem  Description 

Our  goal  is  finding  the  SVD  of  a  real  3x3  matrix  A  so  that 
A  =  U£Vt, 

where  U  and  V  are  orthogonal  matrices,  X  is  a  diagonal  matrix 
consisting  of  the  singular  values  of  A.  In  computational  mechanics, 
U  and  V  are  often  enforced  to  be  rotation  matrices  which  better 
represent  geometric  transformations.  Furthermore,  many  authors 
use  the  conventions  as  in  [Irving  et  al.  2004],  e.g.,  [Sin  et  al.  2011; 
Stomakhin  et  al.  2012;  Hegemann  et  al.  2013;  Stomakhin  et  al. 
2013;  Bouaziz  et  al.  2014;  Stomakhin  et  al.  2014;  Saito  et  al.  2015; 
Gast  et  al.  2015;  Xu  et  al.  2015;  Klar  et  al.  2016],  The  conventions 
are 


Algorithm  1  Constructing  a  3D  Givens  rotation 


1 

procedure  GivensConventional(;c,  y) 

2 

d  4—  a2  +  b2 

3 

c  <—  1 

4 

s  4—  0 

5 

if  d  ^  0  then 

>  no  tolerance  needed 

6 

t  <—  rsqrt(d ) 

l>  fast  inverse  square  root 

7 

c  i —  at 

8 

s  < - bt 

9 

return  (c,  s) 

10 

procedure  Givens  Uncon  VENTiONALfa:,  y) 

11 

d^a2  +  b2 

12 

c  4 —  0 

13 

s  <-  1 

14 

if  d  ^  0  then 

>  no  tolerance  needed 

15 

t  4—  rsqrt(d ) 

>  fast  inverse  square  root 

16 

s  <r-  at 

17 

c  4 —  bt 

18 

return  (c,  s) 

Algorithm  2  Fast  inverse  square  root  for  float 

1 

procedure  AppROXiMATERsQRT(a) 

t>  inaccurate 

2 

return  SIM D_RSQRT(a) 

3 

procedure  RsQRT(a) 

>  much  more  accurate 

4 

b  4—  ApproximateRsqrt(a) 

5 

b  <—  |(3  —  abz)b  >  Newton  step  with  f(x)  =  a  —  1/ar 

6 

return  b 

In  3D,  there  are  3  cases: 


.  UTU  =  I.  VTV  =  I; 

•  det( U)  =  1,  det(V)  =  1; 

•  <71  >  172  >  I  cr3 1 . 


Note  that  <73  <  0  if  det( A)  <  0. 


G3(l,2  ,c(x,y),s(x,y)) 


G3(2,  3,  c(x,  y),  s(x,  y)) 


2  Givens  Rotation 


G3(l,  3,  c(x,  y),  s(x,  y)) 


The  QR  algorithm  largely  depends  on  Givens  rotations.  Once  any 
c  and  s  with  c2  +  s2  =  1  are  computed  from  inputs  x  and  y,  a  2D 
Givens  rotation  is  defined  as 

G2(l,2,c(a -,,y),s(x,y))  =  ^  . 
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s  and  c  are  usually  constructed  so  that 


Sometimes  we  also  need  to  construct  them  so  that 


we  denote  this  unconventional  Givens  rotation  with  G  instead 
of  G.  Algorithm  1  shows  the  pseudocode  for  constructing  G 


Algorithm  3  Polar  Decomposition  of  2  x  2  matrices 

1:  procedure  PolarDecomposition2D(A) 

2:  X  i —  An  T  A22 

3:  y  <—  A21  —  A12 

4:  d  <—  \Jx2  +  y2 

5:  R  +-  G2(l,  2,  c  =  1,  s  =  0)  >  R  is  a  Givens  rotation 

6:  if  d  ^  0  then  l>  no  tolerance  needed 

7:  R  <—  G2(l,  2,  c  =  x/d,  s  =  —  y/d) 

8:  S  <—  RowRotation(R,  A)  >  S  =  RT  A 

9:  return  (R,  S)  >  R  is  a  rotation,  S  is  symmetric 


and  G.  Note  that  we  use  a  fast  inverse  square  root  func¬ 
tion  (Algorithm  2)  from  Streaming  SIMD  Extensions  (SSE)  in- 
trinsics  to  accelerate  the  float  case  (we  use  the  C++  function 
_mm_cvtss_f32(_mm_rsqrt_ss(_mm_set_ss(a)))).  Simi¬ 
larly  to  [McAdams  et  al.  2011],  accuracy  is  improved  by  perform¬ 
ing  an  additional  Newton  step.  In  the  case  of  double  precision,  we 
simply  use  the  standard  C++  square  root  function  to  maintain  accu¬ 
racy. 

We  further  use  the  definition  that 

•  B  =  RowRotation(G,  A)  means  B  =  Gr  A, 

•  B  =  ColumnRotation( G,  A)  means  B  =  AG. 

In  practice  these  operations  are  implemented  more  efficiently  by 
updating  four  entries  of  A  in  place  instead  of  performing  matrix 
products. 

3  SVD  of  2  x  2  Matrices 

As  the  to-be-presented  algorithm  proceeds,  the  problem  will  even¬ 
tually  degrade  into  computing  the  SVD  of  a  2  x  2  matrix.  Here  we 
briefly  describe  how  to  do  so  while  obeying  a  similar  sign  conven¬ 
tion  (U,  V  are  rotations,  <ti  >  I02I). 

Assuming  A  is  2  x  2,  the  first  step  is  computing  its  Polar  Decom¬ 
position  A  =  RS,  where  R  is  a  rotation  and  S  is  symmetric.  As¬ 
suming 


Algorithm  4  Singular  Value  Decomposition  of  2  x  2  matrices 

1:  procedure  Svd2D(A) 

2:  (R,  S)  <—  PolarDecomposition'2D(A) 

3:  if  ,S  1 2  =  0  then  t>  S  is  already  diagonal 

4:  (c,  s)  4—  (1,  0) 

5:  (cri,  <r2)  V-  (Sn,  S22) 

6:  else 

7:  T  ±(Su  -  S22) 

8:  W  <r-  \J  T2  p  S\ 2 

9:  t  —  (t  >  O)?-^2-  :  >  division  is  safe 

10:  C  <-  l/\/t2  +  1 

11:  si - tc 

12:  CTi  i —  cPtS'ii  —  2csSl2  +  s2  S22 

13:  (72  -t—  S2Sll  +  2csSl2  +  <? S22 

14:  if  ai  <  02  then  >  sorting 

15:  swap(ai,a2) 

16:  V  i—  G2(l,  2,  —s,  c) 

17:  else 

18:  V  <-  G2(1,2,c,s) 

19:  U  t-  RV 

20:  return  (U,  ai,a2,  V)  >  U,  V  are  rotations,  0+  >  |er2 1 


zero  [Golub  and  Van  Loan  2012],  Finally, 


U  =  RV, 
s  =  VTSV. 


After  sorting  cti,(72  to  obey  our  sign  convention  and  permuting 
columns  of  U  and  V  accordingly,  we  have  A  =  USVT  where 


.  UTU  =  I,  VTV  =  I; 

•  def(U)  =  1,  det(V)  =  1; 

•  <71  >  |er2 1. 


requiring  RrA  being  symmetric  leads  to  xs  =  yc  where  x  = 
An  +  A22,  y  =  A21  —  A12.  The  two  solutions  are  therefore 


By  taking  the  difference  of  ||S  —  I|j^,  from  two  solutions,  it  can 
be  shown  that  choosing  the  first  one  always  minimizes  it,  therefore 
guarantees  the  chosen  R  is  the  closest  rotation  to  A  (or  S  is  the 
closest  symmetric  matrix  to  I). 

Once  we  have  the  symmetric  matrix  S,  diagonalizing  it  with  a  Ja¬ 
cobi  rotation  can  be  done  similarly  by  solving  c  and  s  from 


(  c  s\  (Su  Si2\  (c  -s\ 
\-s  c]\Sl2  S22J  \s  c) 


2x2  Polar  Decomposition  and  SVD  are  shown  in  Algorithm  3  and 

4. 


4  Implicit  Symmetric  QR  SVD 


The  QR  algorithm  iteratively  applies  Givens  rotations  to  a  tridi¬ 
agonal  symmetric  matrix  (which  in  the  SVD  case  corresponds  to 
T  =  AtA)  to  solve  the  symmetric  eigenproblem.  Instead  of  con¬ 
structing  T,  implicit  symmetric  QR  SVD  works  on  an  upper  bidi¬ 
agonal  A  and  implicitly  does  the  same  thing.  This  results  in  a  much 
higher  accuracy  and  improves  efficiency  [Golub  and  Van  Loan 
2012], 


4.1  Bidiagonalization  and  Zerochasing 


Attentions  need  to  be  paid  to  prevent  potential  division  by 


The  implicit  symmetric  QR  algorithm  starts  with  making  A  upper 
bidiagonal.  For  3x3  matrices,  this  can  be  done  with  4  Givens 


Algorithm  5  Zerochasing:  Assuming  input  A31  —  0,  U,  V  are  ro¬ 
tations,  this  function  makes  A  upper  bidiagonal  while  maintaining 
the  product  UAVT  unchanged 

1 :  procedure  Zerochasing(U,  A,  V)  t>  update  them  in  place 

2:  G  Gs(2,  3,  x  =  A12,  y  =  A13) 

3:  A<-  AG,  U  4-  GtU 

4:  G  4—  Gs(2, 3, x  =  A12, y  =  A13) 

5:  A  -4—  GTA,  V  4-  GTV 

6:  G  4  Gs(2, 3,  x  —  A22,  y  —  A32) 

7:  A  4—  GTA,  U  4-  UG 

8:  return  (U,  A,  V) 


Algorithm  6  Upper  Bidiagonalizing:  Assuming  input  U,  V  are  ro¬ 
tations,  this  function  makes  A  upper  bidiagonal  while  maintaining 
the  product  UAVT  unchanged 

1 :  procedure  BidiagonalizeCU,  A,  V)  o  update  them  in  place 
2:  G  4—  G3  (2,3,®  =  A21 ,  y  =  A31 ) 

3:  A  4—  GtA,  U  4-  UG 

4:  (U,  A,  V)  4—  Zerochasing (\J ,  A,  V) 

5:  return  (U,  A,  V) 


rotation.  Starting  with  A111-*  =  A. 


A'1).gW(2,3,x  =  4°1),!/  =  40i))TAII))  = 


'  *  *  *  ' 
*  *  * 

4-  *  *V 


AW=GW(l,2,a:  =  4i),y  =  41i))TAW-|-  *  * 


A(3)  =  A(2)Gf)(2,3,a:  =  A(j22\y  =  A^)  =  |  * 

A^=G^(2,3  ,x  =  A&\y  =  A®)TA™  = 


*  *  —  > 
* 

*  *  i 


*  *  ' 
*  * 
—  *  t 


In  summary,  A(4)  =  G^Gf  ^G^  A(0)g!,3).  The  Givens 
rotations  need  to  be  absorbed  by  U  and  V  accordingly  during  the 
process.  The  later  three  steps  of  this  process  is  further  called  Ze¬ 
rochasing,  which  takes  a  matrix  of  form 


and  make  it 


We  will  be  using  it  again  in  every  implicit  symmetric  QR  iteration 
(see  Section  4.2).  We  summarize  the  algorithms  for  Zerochasing 
and  upper  bidiagonalization  in  Algorithm  5  and  6. 


4.2  Implicit  Symmetric  QR  SVD  with  Wilkinson  Shift 

Our  algorithm  follows  [Golub  and  Van  Loan  2012].  Starting  front 
U  =  I  and  V  =  I,  we  first  perform  the  upper  bidiagonaliza¬ 
tion  described  in  Section  4.1  to  matrix  A  with  U  and  V  also  up¬ 
dated.  We  use  B  to  denote  the  bidiagonal  matrix,  where  we  have 


UBVt  =  A.  The  implicit  QR  iteration  operates  on  B  iteratively 
and  update  U  and  V  on  the  fly.  Denoting  B  with 

(a  1/81  \ 

B  =  I  a.2  P2  I  , 

V  qv 

the  corresponding  symmetric  eigenproblem  is  on  the  matrix 

/  a?  aiPi 

T  =  BrB  =  B  =  j  aipi  a\  +  Pi  012P2 

\  CX2P2  C&,  +  P2 

QR  iteration  seeks  to  eliminate  the  off-diagonal  entries  of  T. 
Equivalently,  one  or  more  values  of  (ai,  pi,  a.2 ,  P2)  will  converge 
to  something  close  to  zero.  We  will  show  in  Section  4.3  that  once 
any  of  (qi,  /3i,  Q2,  P2,  0:3)  becomes  smaller  than  a  tolerance  r,  we 
can  terminate  the  QR  iterations  and  degrade  the  problem  to  a  2  x  2 
SVD.  The  termination  tolerance  r  is  computed  as  a  relative  toler¬ 
ance  via 

r  =  max( -  ||B||F  ,  1)?? 

where  we  choose  r/  =  128e  and  e  is  the  floating  point  machine 
epsilon. 

QR  Factorization  The  QR  Factorization  of  a  symmetric  tridiag¬ 
onal  matrix  T  £  RriX”  can  be  easily  done  using  n  —  1  Givens 
rotations  with  Q  being  a  rotation  matrix  and  R  being  upper  trian¬ 
gular. 

QR  Iteration  If  A  G  Rnxn  is  symmetric,  Ro  is  orthogonal  and 
To  =  Ro' ARo,  then  the  iteration 

Tfc_i  =  QfcRfc, 

Tfc  =  RfcQfc 

implies  =  (R0R1  . . .  Rfc)T  A(RqRi  . . .  Rfc)  is  symmetric 
tridiagonal,  and  converges  to  a  diagonal  form  [Trefethen  and  Bau  III 
1997;  Golub  and  Van  Loan  2012]!" 

Implicit  Q  Theorem  Given  A  e  Rnxn  symmetric,  QrAQ  = 
T,  VrAV  =  S.  Q  and  V  are  orthogonal,  T  and  S  are  sym¬ 
metric  tridiagonal.  If  A  is  unreduced  (meaning  it  has  non-zero 
sub-diagonal  entries)  and  the  first  column  of  Q  and  V  are  equal 
(qi  =  vi),  then  qi  =  ±v,  and  T(J  |  =  |  S) ,  |  [Golub  and  Van  Loan 
2012], 

Explicit  Shifted  QR  Iteration  If  y  is  a  good  approximate  eigen¬ 
value  of  T,  then  T„jn_i  tends  to  become  smaller  after  a  shifted 
QR  step: 


T  —  fil  =  QR, 

Tnew  =  RQ  +  /;I  =  QtTQ 

and  T  maintains  a  symmetric  tridiagonal  form  [Golub  and 
Van  Loan  2012]. 

Wilkinson  Shift  A  good  choice  of  the  shift  //  is  the  eigenvalue 
of  T's  bottom  right  2x2  block  that  is  closer  to  Tnn  [Golub  and 
Van  Loan  2012],  This  shift  gives  average  cubic  convergence  rate 
for  reducing  Tn,n-i  to  zero.  In  the  3x3  case  where 

(01  bi 

fel  02  62 

&2  a3 


the  shift  is  given  by  fj,  =  <13  +  d  —  signed)  \J  d?  +  6|  where  d  = 
(02  —  a3)/2  and  sign(d)  =  ±1  (choose  1  when  ci  =  0). 

Implicit  Shifted  QR  Iteration  The  shifted  QR  iteration  can  be 
done  without  constructing  T  —  gl  explicitly.  Let’s  focus  on  the 
3x3  case  where  we  have 

(01  -  p  bi 

bi  a  2  —  /.t  b2 

62  03  —  H 

The  QR  decomposition  T  —  //I  =  QR  looks  like 

(ai  —  n  bi  \  /*  *  *\ 

bi  0,2  -  n  b2  )  =  (qi  q2  q3)  *  *  )  , 

62  a3  -  gj  \  *) 

this  implies  qi  =  7(01  —g,  61 , 0)T  for  some  normalization  scale  7. 
If  we  construct  a  Givens  rotation  G1  =  G3(l,  2,  x  =  ai  —  g,  y  = 
61),  then  it  follows  g}  =  lo(cii  —  li,  61,  0)T  for  some  normaliza¬ 
tion  scale  00.  Therefore  we  know  g]  =  qi,  i.e.,  G1  and  Q  has  the 
same  first  column.  If  we  further  find  G2  such  that  Z  =  G1G2  has 
the  same  first  column  with  G1  and  S  =  ZrTZ  is  symmetric  tridi¬ 
agonal,  then  by  implicit  Q  Theorem,  since  Tnew  =  QrTQ  and 
S  =  ZtTZ,  it  follows  qi  =  ±z;  and  |Ti"e“'|  =  |5ij|.  Therefore, 
utilizing  G1  and  G2  accomplishes  the  same  effect  as  an  explicit 
shifted  QR  iteration  step  for  updating  T. 

Implicit  Shifted  QR  in  the  SVD  Case  For  SVD,  we  prefer  op¬ 
erating  on  B  directly  to  constructing  T.  Applying  G1  directly  to 
B  followed  by  Zerochasing  B  back  to  upper  bidiagonal  is  equiva¬ 
lent  to  doing  implicit  QR  on  T  [Golub  and  Van  Loan  2012],  More 
specifically  in  our  3x3  case,  after  applying  Gi  as  a  column  rota¬ 
tion  to  B.  the  column  rotation  in  the  Zerochasing  (i.e..  G§  in  Sec¬ 
tion  4.1)  essentially  is  the  G2  we  want  to  find  in  the  implicit  QR 
for  T  with  the  property  that  G2G2  has  the  same  first  column  with 
Q.  Therefore  by  operating  on  B  directly,  the  implicit  symmetric 
QR  algorithm  is  correctly  applied. 

We  summarize  the  implicit  shifted  QR  SVD  in  Algorithm  7.  The 
steps  after  exiting  the  loop  is  described  in  Section  4.3. 

4.3  Postprocess  and  Sorting 

If  any  a  or  /3  from  Algorithm  7  becomes  small,  implicit  QR  itera¬ 
tion  is  terminated.  Here  we  show  how  each  case  is  degraded  to  a 
2x2  easy  problem. 

4.3.1  Deflation  Cases 

Case  1 :  I/82I  <  t.  In  this  case 


we  just  need  to  compute  the  2x2  SVD  of  the  top  left  sub-matrix 
and  assemble  back  to  3D  with 


Algorithm  7  Implicit  Shifted  QR  SVD  of  3  x  3  matrices 

1:  procedure  Svd3D(A)  l>  return  U,  S,  V 

2:  B  <—  A 

3:  U.V^I 

4:  Bidiagonalize( U,  B,V)  t>  B  is  now  upper  bidiagonal 

5:  (cti,  a2,  a3)  <—  (B11,  B22,  B33) 

6:  (/3i,  /32)  <—  (-B12,  -B23) 

7:  (71,72)  <-  {ctiPi,a2P2) 

8:  7]  3—  128e 

9:  r  <—  7inax(0.5  ||B|jF  ,  1) 

10:  while  \/32\,  |/3i|,  |ai|,  |a2|,  |a3|  >  r  do 

11:  ai^ai+/3f 

12:  a2  <—  CX3  +  02 

13:  61  <—  72 

14:  d  (01  —  a2)/2 

15:  g  copysign(b\/{\d\  +  \J d?  +  6j),d) 

16:  G  <—  G3(l,  2,  a)  —  /i,  71) 

17:  B<-BG 

18:  V<-VG 

19:  Zerochasing(XJ ,  B,  V) 

20:  (ai,  Q2,  a3)  (Bn,  B22,  B33) 

21:  (/3i ,  P2)  <—  (R12,  B23) 

22:  (71,72)  (ai/3i,  0:2^2) 

23:  (U,  S,  V)  <—  Postprocess(B,  U,  V,  a,  j3 )  >  section  4.3 

24:  return  (U,  S,  V)  >  U,  V  are  rotations,  cr\>  a2>  |cr3| 


Case  2:  |/5i|  <  r.  In  this  case 


we  just  need  to  compute  the  2x2  SVD  of  the  bottom  right  sub¬ 
matrix  and  assemble  back  to  3D. 

Case  3:  | «2 1  <  r.  In  this  case 


performing  an  unconventional  Givens  rotation  G  =  G3(2, 3,  x  = 
B23,  y  =  B33)  with  B  GTB  reduces  B  to  the  form 


where  we  just  need  to  compute  the  2x2  SVD  of  the  top  left  sub¬ 
matrix  and  assemble  back  to  3D. 

Case  4:  |a3|  <  r.  In  this  case 


We  can  use  G  =  G3(2,  3,  x  =  B22,y  =  B23)  with  B  BG  to 
reduce  B  to  the  form 


Algorithm  8  Postprocessing  after  QR  Iterations 

1:  procedure  Postprocess((B,  U,  V,  a,  /?))  >  return  U,  £,  V 
2:  if  |/?2 1  <  t  then 

3:  SolveReducedTopLeft(B,U ,  £,V) 

4:  SortW ithTopLeftSub (U ,  £ ,  V) 

5:  else  if  |/3i|  <  r  then 

6:  Solve  Reduced BotRight(B,  U,  £,  V) 

7:  SortW ithBotRightSub(\] ,  £,V) 

8:  else  if  |«2|  <  r  then 

9:  G  4— 63(2,  3,  a;  = -B23, 2/ =  B33) 

10:  B  4-  GTB 

11:  U<-UG 

12:  SolveReducedTopLeft(B,XJ ,  £,V) 

13:  SortWithTopLeftSub(  U,£,V) 

14:  else  if  1 0:3 1  <  r  then 

15:  G  4—  Gs(2, 3,x  =  B22,  y  =  B23) 

16:  B  4—  BG 

17:  V^-VG 

18:  G  4—  G3  (1,  3,  x  =  -Bn ,  y  =  B13) 

19:  B  4—  BG 

20:  V^VG 

21:  SolveReducedTopLeft(B,XJ ,  £,V) 

22:  SortW ithTopLeftSub(  U,S,V) 

23:  else  if  |ai|  <  r  then 

24:  G  4—  63(1,  2,  x  =  -B12,  y  =  -B22) 

25:  B  4-  GTB 

26:  IJ  4-  UG 

27:  G  4—  63(1,  3,  x  =  -B13,  y  =  B33) 

28:  B  4-  GTB 

29:  U  4—  UG 

30:  SolveReducedBotRight(B ,  U,  £,  V) 

31:  SortWithBotRightSub(\J ,  £,V) 

32:  return  (U,  S,  V)  >U,V  are  rotations,  cri  >  4x2  >  |o\3| 


followed  by  G  =  G3(l,  3,  x  =  Bn,  y  =  B 13)  with  B  4—  BG  to 
further  reduce  to 


where  we  just  need  to  compute  the  2x2  SVD  of  the  top  left  sub¬ 
matrix  and  assemble  back  to  3D. 

Case  5:  |ori  |  <  t.  In  this  case 


Performing  an  unconventional  Givens  rotation  G  =  G3(l  ,2,x  = 
B 12,  y  =  B22)  with  B  4—  GTB  reduces  B  to  the  form 


Further  performing  an  unconventional  Givens  rotation  G  = 
Gr3(l,3, x  =  Bi3,y  =  B33)  with  B  4—  GTB  reduces  B  to  the 
form 


where  we  just  need  to  compute  the  2x2  SVD  of  the  bottom  right 
sub-matrix  and  assemble  back  to  3D. 


Algorithm  9  Solve  Reduced  SVD  Problem 

1:  procedure  SolveReducedTopLeft(B,  U,  £,  V) 

2:  o~3  4  B33 

3:  u  4 —  G2(l,  2) 

4:  v  4 —  Go  (1,  2) 

5:  (u,ai,<T2,v)  =  Svd'2D((TopLeftBlock(B)) 

6:  u  4—  G3(l,  2,  c  =  u.c,  s  =  u.s) 

7:  v  4—  G3(l,  2,  c  =  v.c,  s  =  v.s) 

8:  U  4-  Uu 

9:  V4-Vv 

10:  return  (U,S,V)  >U,V  are  rotations,  01  >  |o"2 

11:  procedure  SolveReducedBotRight(B,  U,  S,  V) 

12:  01  4-  Bn 

13:  u  4 —  G2(l,  2) 

14:  v  4 —  G2(l,  2) 

15:  (u,  02,  &3,  v)  =  Svd'2D((BotRightBlock(H)) 

16:  u  4—  G3(2, 3,  c  =  u.c,  s  =  u.s) 

17:  v  4—  G3(2,  3,  c  =  v.c,  s  =  v.s) 

18:  U  4—  Uu 

19:  V4-Vv 

20:  return  (U,S,V)  >U,V  are  rotations,  02  >  1 0-3 1 


4.3.2  Sorting 

After  processing  the  reduced  2x2  cases,  the  full  £  needs  to  be 
carefully  sorted  to  obey  our  sign  convention.  We  give  out  the  pseu¬ 
docode  for  the  full  postprocessing  (including  sorting)  in  Algorithm 
8,  9,  and  10. 

5  Performance 

Our  algorithm  is  implemented  on  top  of  the  Eigen  C++  template 
library  for  its  automatic  vectorization  of  certain  matrix  operations. 
The  fast  inverse  square  root  function  is  the  only  explicit  dependency 
of  SSE  intrinsics,  which  is  pretty  standard  on  modern  CPUs.  We 
test  our  code  on  a  12-core  3.47GHz  Intel  Xeon  X5690  server  using 
the  g++  compiler  for  Linux,  version  5.3.0.  We  use  the  following 
data  set  as  our  test  cases: 

1.  Totally  1024  x  1024  =  1, 048,  576  random  real  matrices  with 
each  entry  uniformly  sampled  from  —3.0  to  3.0. 

2.  All  integer  entry  matrices  with  each  entry  ranging  from  —2  to 
2  (totally  59  =  1,  953, 125  matrices). 

3.  Starting  from  the  integer  entry  matrices,  we  perturb  all  entries 
with  random  numbers  from  — 256e  to  256e  (where  t  is  the 
floating  point  machine  precision).  We  generate  4  perturbed 
cases  for  each  integer  matrix,  resulting  in  4x59  =  7,812,500 
perturbed  integer  matrices. 

4.  We  perturb  the  all  entries  of  an  identity  matrix  with  ran¬ 
dom  numbers  from  — 256e  to  256e.  Totally  1024  x  1024  = 
1,  048,  576  matrices  are  generated. 

5.  We  perturb  the  all  entries  of  an  identity  matrix  with  random 
numbers  from  —0.001  to  0.001.  Totally  1024  x  1024  = 
1,  048,  576  matrices  are  generated. 

The  random  integer  matrix  and  perturbed  integer  test  cases  show 
the  performance  on  singular  and  nearly  singular  matrices.  Matrices 
close  to  the  identity  are  frequently  obtained  in  simulations,  partic¬ 
ularly  when  warm  starts  are  used.  We  run  all  tests  with  float  and 
double  precisions,  and  compare  the  results  with  the  SVD  described 
in  [Irving  et  al.  2004]  (implemented  in  the  PhysBAM  library  and 
imported  to  our  Eigen  code  base),  Jacobi  SVD  (built  in  Eigen),  and 
the  one  in  Vega  FEM  library  [Barbie  et  al.  2012]  (See  Figure  1  and 
2).  We  don’t  directly  compare  with  [McAdams  et  al.  2011]  because 
their  implementation  is  explicitly  tailored  to  the  characteristics  of 


QR  SVD  o  ITF  04  Eigen  Jacobi  o  Vega  FEM 


QR  SVD  o  ITF  04  Eigen  Jacobi  o  Vega  FEM 


Timing  (float) 

ITF  04  Eigen  Jacobi 


0.3438 

0.5669 


0.1899 

0.2806 


OR  SVD 

0.6445 

1.0335 

4.4722 

0.2952 

0.6123 


0.3637  | 
0.6597 
2.6397 
0.3619 
0.3619 


1.4422 

2.5292 

10.3638 

0.9887 

1.1439 


Timing  (double) 

ITF  04  Eigen  Jacobi 

0.3517  !  1.8252 

0.6354  3.1482 


2.5336 

0.3481 

0.3481 


12.9572 

1.0295 

1.6234 


0.6079 

0.6182 


Vega  FEM 

0.6367 

1.0834 

4.4881 

0.4800 

0.6141 


Test  cases  (float) 

Figure  1:  Timing  comparisons  on  all  5  tests  with  float  and  double  precisions. 


Reconstruction  Maximum  Error  (float) 


QR  SVD 

ITF  04 

Eigen  Jacobi 

Vega  FEM 

1 

4.965E-05 

7.153E-07 

7.153E-06 

7.153E-07 

2 

3.123E-05 

1 .968E-04 

4.530E-06 

4.768E-07 

3 

4.035E-05 

2.896E-04 

5.007E-06 

1 .986E-06 

4 

1.542E-05 

2.384E-07 

1.609E-06 

2.384E-07 

5 

1.528E-05 

2.384E-07 

1.907E-06 

2.384E-07 

Reconstruction  Maximum  Error  (double) 

QR  SVD 

ITF  04 

Eigen  Jacobi 

Vega  FEM 

1 

8.971  E-14 

8.984E-10 

1.332E-14 

3.613E-10 

2 

5.351  E-1 4 

1 .968E-04 

8.438E-15 

6.322E-08 

3 

7.471  E-14 

3.189E-04 

1.021  E-14 

8.099E-08 

4 

2.850E-14 

3.520E-15 

2.887E-15 

2.442E-15 

5 

2.820E-14 

3.113E-14 

4.219E-15 

2.665E-15 

Figure  2:  Maximum  reconstruction  error  is  computed  as  the  maxi¬ 
mum  absolute  value  of  entries  in  USVT  —  A. 


SIMD  or  vector  processors,  and  likely  runs  faster  for  a  large  array 
of  3  x  3  matrices  in  parallel. 

We  observe  that  implicit  QR  SVD  has  consistent  accuracy  for  dif¬ 
ferent  test  cases.  The  accuracy  can  be  adjusted  further  by  changing 
the  QR  iteration  tolerance  and  reach  the  machine  precision.  The 
algorithm  of  [Irving  et  al.  2004]  becomes  less  accurate  in  certain 
degenerate  cases  even  with  double  precision.  This  is  largely  due  to 
the  loss  of  information  form  constructing  AT  A  explicitly. 

For  the  tests  we  performed,  implicit  QR  SVD  is  the  fastest  in  floats, 
and  comparable  to  [Barbie  et  al.  2012]  in  doubles.  Notably,  the 
Jacobi  SVD  built  in  Eigen  is  much  slower  than  all  other  algorithms. 
This  is  partially  due  to  their  implementation  is  for  general  matrices 
with  arbitrary  dimension.  A  specifically  designed  algorithm  such 
as  in  [McAdams  et  al.  2011]  will  probably  improve  it. 

In  summary,  the  Implicit  QR  SVD  described  in  this  document  pro¬ 
vides  a  nice  balance  between  speed  and  accuracy.  We  release  our 
C++  code  together  with  this  document  and  expect  it  to  benefit  many 
applications  in  computer  graphics  and  computational  solid  mechan¬ 
ics. 
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